Psychology 202b

Advanced Psychological Statistics

Study Guide for Midterm Exam, Spring 2011

When I prepare for an exam, I review the syllabus and ask myself, "What was really important in each section of the class? What did we spend a lot of time on (either in class, or on homework assignments)?" With such questions in mind, let's review where we've been. After each major section, I have listed a some questions that you can ask yourself to check your understanding. These questions do not represent an exhaustive list of material that might be covered on the exam.

We began the semester by learning a little about matrices. We discussed what matrices are, how to multiply them, specific forms of matrices (e.g., diagonal, upper and lower triangular, symmetric), the identity matrix, matrix inversion, determinants and singularity, and eigenanalysis. Some reasonable questions you might ask yourself about matrices include:

Can I multiply two matrices, or conclude that they do not conform for multiplication?

Can I calculate the inverse of a diagonal matrix?

Can I demonstrate that an inverse matrix really is an inverse?

Can I interpret a determinant with respect to the invertability or singularity of a matrix?

Next, we spent some time reviewing what we already knew about simple and multiple linear regression. We discussed assumptions, and added new material about estimation principles, matrix representation, collinearity, and diagnostics for outliers and influential observations. Some questions that might help you prepare for related exam items include:

Can I show how prediction using linear regression is represented as matrix multiplication?

Can I identify the crucial assumptions necessary for inference in linear regression, and interpret diagnostic plots relevant to those assumptions?

Can I interpret SAS output related to the identification of outliers, potentially influential cases, and cases that actually have a strong influence on the regression estimates?

Can I interpret R transcripts relevant to diagnosing collinearity?

We discussed several types of confidence interval: intervals for a slope, for a conditional mean, and for an individual outcome. Some relevant questions include:

Given the (X'X)^-1 matrix and the error mean square, can I identify the standard error of a particular slope?

Can I construct the standard error of a conditional mean from that same information for a two-predictor regression? (I wouldn't ask you to do that for more than two predictors on an exam.)

Can I calculate the standard error for individual prediction, given that same information?

Can I use any of those standard errors to calculate a corresponding confidence interval?

Can I interpret such a confidence interval, avoiding the usual pitfalls of interpretation?

We discussed transformations and the Box-Cox procedure for identifying the optimal normalizing transformation. We noted that transformations are most often used to correct heteroscedasticity or non-linearity. However, the best normalizing transformation will also sometimes correct heteroscedasticity. Some relevant questions include:

Can I identify from graphics situations that might benefit from transformations?

Can I identify the appropriate transformation from a Box-Cox likelihood curve?

We discussed sequential regression. Some questions that could be useful as you prepare include:

Could I perform sequential inference from a set of increasingly complex regression models?

Can I identify variables that should be entered sooner or later given a particular research situation?

We discussed the advantages and many disadvantages of stepwise regression. Useful preparation for related exam questions would include thinking carefully about situations where stepwise regression is appropriate, identifying steps that should be taken to maximize its utility (e.g., cross validation), and reviewing the several reasons why it is not usually a very good idea.

We reviewed regression with categorical predictors and mixtures of categorical and continuous predictors. We noted that including interactions in such models effectively allows the simultaneous estimation of a separate regression equation for each class identified by the categorical predictors. Some relevant review questions include:

Given SAS or R regression output with interactions, could I identify the separate regression equations for different categories?

Could I conduct a test of the collective significance of different slopes in such a context, given output from regression analyses conducted with and without the interactions?

We discussed power analysis for regression. Power analysis requires access to software such as R or G*Power, so it is impractical to ask you to do such an analysis on the exam. However, you should be able to interpret output from R or G*Power that is relevant to power analysis.

Finally, we discussed continuous interactions. We talked about the importance of centering continuous variables in this context, and we discussed a couple of different ways to try to understand continuous interactions by treating them as if a variable is categorical (the +- 1 sd approach, and the process of approximating the continuous interaction using a categorical of one of the continuous variables). You should be able to identify whether a plot based on one of those methods matches continuous regression output.